{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Resource estimation profiling in quantum adders\n",
    "\n",
    "In this sample, we are implementing two quantum adders and then inspecting them\n",
    "using the profiling feature in the Azure Quantum Resource Estimator.  The\n",
    "profiling feature allows you to analyze how sub-operations in the quantum\n",
    "algorithm impact the overall resources.\n",
    "\n",
    "In particular, you will learn how to use the `profiling.call_stack_depth` and\n",
    "`profiling.inline_functions` job parameters to enable profiles in resource\n",
    "estimation jobs, and how to use the `call_graph` and `profile` properties on\n",
    "resource estimation results to show call graphs and resource estimation\n",
    "profiles, respectively.\n",
    "\n",
    "The notebook is structured as follows.  First, we connect to an Azure Quantum\n",
    "workspace and import the necessary functions, then we describe the two quantum\n",
    "adder implementations.  It's okay to skip the implementation part and jump right\n",
    "into the next session, which shows how to use the profiling feature to construct\n",
    "call graphs and resource estimation profiles for time and space.  Finally, we\n",
    "show how the detailed profiling information can be used for advanced analyses."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from azure.quantum import Workspace\n",
    "from azure.quantum.target.microsoft import MicrosoftEstimator\n",
    "import pandas\n",
    "import qsharp\n",
    "import re"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "workspace = Workspace (\n",
    "    resource_id = \"\",\n",
    "    location = \"\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Quantum adders\n",
    "\n",
    "In this section, we implement two Q# operations `RippleCarryAdd` and\n",
    "`LookAheadAdd` together with estimation entry points `EstimateRippleCarryAdd`\n",
    "and `EstimateLookAheadAdd`, respectively, which each can be provided a\n",
    "bit-width.  Feel free to skip over the implementation details of these adders,\n",
    "right to the next section in which we are using the profiling feature to analyze\n",
    "them.\n",
    "\n",
    "In the realm of quantum computing, the ability to perform addition is crucial\n",
    "for a wide range of applications. In this section, we will introduce two\n",
    "different quantum adders that can be used to perform addition: the ripple-carry\n",
    "adder and the carry-lookahead adder. Given two qubit $n$-bit registers\n",
    "$|x\\rangle = |(x_{n-1}\\dots x_1x_0)_2\\rangle$ and $|y\\rangle = |(y_{n-1}\\dots\n",
    "y_1y_0)_2\\rangle$, the goal is compute an $n+1$ bit register $|z\\rangle = |x +\n",
    "y\\rangle$.\n",
    "\n",
    "Both implementations will also support the overflowing variant, in which the\n",
    "output register has $n$ bits, and we compute $|z\\rangle = |(x + y) \\bmod\n",
    "2^n\\rangle$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Ripple-carry adder\n",
    "\n",
    "We review the ripple-carry adder described in [Halving the cost of quantum\n",
    "addition](https://arxiv.org/abs/1709.06648).  We focus on the out-of-place\n",
    "variant, in which the sum of input registers $|x\\rangle$ and $|y\\rangle$ is\n",
    "computed into a $|0\\rangle$-initialized output register $|z\\rangle$.  A\n",
    "ripple-carry adder adds the bits of $|x\\rangle$ and $|y\\rangle$ one by one,\n",
    "starting from the rightmost bit, and propagates the carry generated by the\n",
    "addition to the next bit to the left. This process is repeated until all the\n",
    "bits have been added, resulting in a final sum and carry-out.  The core\n",
    "component of the ripple-carry adder is the full adder, which is a one-bit adder\n",
    "that takes a carry input bit, and returns a carry output bit, in addition to the\n",
    "sum bit."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%qsharp\n",
    "internal operation FullAdder(carryIn : Qubit, x : Qubit, y : Qubit, carryOut : Qubit) : Unit is Adj {\n",
    "    CNOT(x, y);\n",
    "    CNOT(x, carryIn);\n",
    "    ApplyAnd(y, carryIn, carryOut);\n",
    "    CNOT(x, y);\n",
    "    CNOT(x, carryOut);\n",
    "    CNOT(y, carryIn);\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With this operation at hand, implementing the ripple carry adder is as simple as\n",
    "carrying the carry bit through successive full adder invocations.  The two CNOT\n",
    "operations at the end handle the case in which $|z\\rangle$ has $n$ bits, in\n",
    "which the sum for the most significant bit is not computed by a full adder\n",
    "(since the corresponding carry out bit does not exist)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%qsharp\n",
    "open Microsoft.Quantum.Arrays;\n",
    "open Microsoft.Quantum.Diagnostics;\n",
    "\n",
    "operation RippleCarryAdd(xs : Qubit[], ys : Qubit[], zs : Qubit[]) : Unit is Adj {\n",
    "    let n = Length(xs);\n",
    "    let m = Length(zs);\n",
    "\n",
    "    Fact(Length(ys) == n, \"Registers xs and ys must be of same length\");\n",
    "    Fact(m == n or m == n + 1, \"Register zs must be same length as xs or one bit larger\");\n",
    "\n",
    "    for k in 0..m - 2 {\n",
    "        FullAdder(zs[k], xs[k], ys[k], zs[k + 1]);\n",
    "    }\n",
    "\n",
    "    if n > 0 and n == Length(zs) {\n",
    "        CNOT(Tail(xs), Tail(zs));\n",
    "        CNOT(Tail(ys), Tail(zs));\n",
    "    }\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, we are creating an entry point for resource estimation, in which we are\n",
    "calling the ripple-carry adder for a given bitwidth, which can be provided as\n",
    "an input argument. (We are considering the $n+1$ output register in this\n",
    "notebook.)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%qsharp\n",
    "operation EstimateRippleCarryAdd(bitwidth : Int) : Unit {\n",
    "    use xs = Qubit[bitwidth];\n",
    "    use ys = Qubit[bitwidth];\n",
    "    use zs = Qubit[bitwidth + 1];\n",
    "\n",
    "    RippleCarryAdd(xs, ys, zs);\n",
    "}\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Carry-lookahead adder\n",
    "\n",
    "In [_A logarithmic-depth quantum carry-lookahead adder_](https://arxiv.org/abs/quant-ph/0406142), the authors present an out-of-place quantum adder implementation based on the [classical carry-lookahead adder method](https://en.wikipedia.org/wiki/Carry-lookahead_adder).  To briefly summarize, the idea of carry-lookahead adder is to compute all carry bits $c_i$ based on propagate bits $p_i = x_i \\oplus y_i$ and generate bits $g_i = x_i \\land y_i$, without requiring other carry bits except for the carry-in $c_0$.\n",
    "\n",
    "For example, the first carry bit can be computed as $c_1 = g_0 \\oplus (p_0 \\land c_0)$, since either it is generated from bits $x_0$ and $y_0$ (when both are 1, and therefore $g_0 = 1$) or the carry bit $c_0$ is propagated (if either $x_0$ or $y_0$ is 1, and therefore $p_0 = 1$).  More significant carry bits are computed in a similar way, for example $c_3 = g_2 \\oplus (g_1 \\land p_2) \\oplus (g_0 \\land p_1 \\land p_2) \\oplus (c_0 \\land p_0 \\land p_1 \\land p_2)$.  That is, $c_3$ is either generated from bits at index 2, or generated from bits at index 1 _and_ propagated from bits at index 2, and so on.\n",
    "\n",
    "In order to minimize AND gates, these intermediate products can be computed in a clever way, as well as in logarithmic depth.  We are now looking at an implementation of the carry-lookahead adder in Q#, and start by implementing a helper function to compute the number of 1-bits in an integer, also called Hamming weight, using a compact implementation based on a sequence of bitwise manipulations (you can learn more about these constants and why it works [on this article in Wikipedia](https://en.wikipedia.org/wiki/Hamming_weight#Efficient_implementation))."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%qsharp\n",
    "function HammingWeight(n : Int) : Int {\n",
    "    mutable i = n - ((n >>> 1) &&& 0x5555555555555555);\n",
    "    set i = (i &&& 0x3333333333333333) + ((i >>> 2) &&& 0x3333333333333333);\n",
    "    return (((i + (i >>> 4)) &&& 0xF0F0F0F0F0F0F0F) * 0x101010101010101) >>> 56;\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next we are going to implement internal routines that compute the generalized propagate bits, generalized generate, as well as the carry bits from them.  These are called `PRounds`, `GRounds`, and `CRounds`, and descriptions of their implementations can be found in the paper above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%qsharp\n",
    "open Microsoft.Quantum.Arrays;\n",
    "open Microsoft.Quantum.Convert;\n",
    "open Microsoft.Quantum.Math;\n",
    "\n",
    "internal operation PRounds(pWorkspace : Qubit[][]) : Unit is Adj {\n",
    "    for ws in Windows(2, pWorkspace) {\n",
    "        let (current, next) = (Rest(ws[0]), ws[1]);\n",
    "\n",
    "        for (m, target) in Enumerated(next) {\n",
    "            ApplyAnd(current[2 * m], current[2 * m + 1], target);\n",
    "        }\n",
    "    }\n",
    "}\n",
    "\n",
    "internal operation GRounds(pWorkspace : Qubit[][], gs : Qubit[]) : Unit is Adj {\n",
    "    let numRounds = Length(pWorkspace);\n",
    "    let n = Length(gs);\n",
    "\n",
    "    for t in 1..numRounds {\n",
    "        let length = Floor(IntAsDouble(n) / IntAsDouble(2^t)) - 1;\n",
    "        let ps = pWorkspace[t - 1][0..2...];\n",
    "\n",
    "        for m in 0..length {\n",
    "            CCNOT(gs[2^t * m + 2^(t - 1) - 1], ps[m], gs[2^t * m + 2^t - 1]);\n",
    "        }\n",
    "    }\n",
    "}\n",
    "\n",
    "internal operation CRounds(pWorkspace : Qubit[][], gs : Qubit[]) : Unit is Adj {\n",
    "    let n = Length(gs);\n",
    "\n",
    "    let start = Floor(Lg(IntAsDouble(2 * n) / 3.0));\n",
    "    for t in start..-1..1 {\n",
    "        let length = Floor(IntAsDouble(n - 2^(t - 1)) / IntAsDouble(2^t));\n",
    "        let ps = pWorkspace[t - 1][1..2...];\n",
    "\n",
    "        for m in 1..length {\n",
    "            CCNOT(gs[2^t * m - 1], ps[m - 1], gs[2^t * m + 2^(t - 1) - 1]);\n",
    "        }\n",
    "    }\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With these operations we can build an operation that computes the carry bits\n",
    "from the initial propagate and generate bits.  Note that the generalized\n",
    "propagate bits are computed out-of-place into some helper qubits `qs`, whereas\n",
    "the generalized generate and carry bits are computed in-place into the register\n",
    "`gs`, which contains the initial generate bits."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%qsharp\n",
    "internal operation ComputeCarries(ps : Qubit[], gs : Qubit[]) : Unit is Adj {\n",
    "    let n = Length(gs);\n",
    "\n",
    "    let numRounds = Floor(Lg(IntAsDouble(n)));\n",
    "    use qs = Qubit[n - HammingWeight(n) - numRounds];\n",
    "\n",
    "    let registerPartition = MappedOverRange(t -> Floor(IntAsDouble(n) / IntAsDouble(2^t)) - 1, 1..numRounds - 1);\n",
    "    let pWorkspace = [ps] + Partitioned(registerPartition, qs);\n",
    "\n",
    "    within {\n",
    "        PRounds(pWorkspace);\n",
    "    } apply {\n",
    "        GRounds(pWorkspace, gs);\n",
    "        CRounds(pWorkspace, gs);\n",
    "    }\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, we are all set up to implement the carry-lookahead adder, by first\n",
    "computing the initial propagate and generate bits, then computing the carry\n",
    "bits, and finally computing the output bits using the sums (initial propagate\n",
    "bits) together with the carry bits.  Note that this implementation supports both\n",
    "an variant where the output register is 1 bit larger and does not overflow, as\n",
    "well as a variant in which the sum is computed modulo $2^n$.  The latter uses\n",
    "the former by using a special treatment of the most-significant bits of the\n",
    "input registers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%qsharp\n",
    "open Microsoft.Quantum.Arrays;\n",
    "open Microsoft.Quantum.Canon;\n",
    "open Microsoft.Quantum.Diagnostics;\n",
    "\n",
    "operation LookAheadAdd(xs : Qubit[], ys : Qubit[], zs : Qubit[]) : Unit is Adj {\n",
    "    let n = Length(xs);\n",
    "    let m = Length(zs);\n",
    "\n",
    "    Fact(Length(ys) == n, \"Registers xs and ys must be of same length\");\n",
    "    Fact(m == n or m == n + 1, \"Register zs must be same length as xs or one bit larger\");\n",
    "\n",
    "    if m == n + 1 { // with carry-out\n",
    "        // compute initial generate values\n",
    "        for k in 0..n - 1 {\n",
    "            ApplyAnd(xs[k], ys[k], zs[k + 1]);\n",
    "        }\n",
    "\n",
    "        within {\n",
    "            // compute initial propagate values\n",
    "            ApplyToEachA(CNOT, Zipped(xs, ys));\n",
    "        } apply {\n",
    "            if n > 1 {\n",
    "                ComputeCarries(Rest(ys), Rest(zs));\n",
    "            }\n",
    "\n",
    "            // compute sum into carries\n",
    "            for k in 0..n - 1 {\n",
    "                CNOT(ys[k], zs[k]);\n",
    "            }\n",
    "        }\n",
    "    } else { // without carry-out\n",
    "        LookAheadAdd(Most(xs), Most(ys), zs);\n",
    "        CNOT(Tail(xs), Tail(zs));\n",
    "        CNOT(Tail(ys), Tail(zs));\n",
    "    }\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, we are creating an entry point for resource estimation, in which we are\n",
    "calling the carry-lookahead adder for a given bitwidth, which can be provided as\n",
    "an input argument."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%qsharp\n",
    "operation EstimateLookAheadAdd(bitwidth : Int) : Unit {\n",
    "    use xs = Qubit[bitwidth];\n",
    "    use ys = Qubit[bitwidth];\n",
    "    use zs = Qubit[bitwidth + 1];\n",
    "\n",
    "    LookAheadAdd(xs, ys, zs);\n",
    "}\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Profiling\n",
    "\n",
    "With these two adder implementations, we are now ready for some profiling.  The\n",
    "profiling feature in Azure Quantum Resource Estimator will create a resource\n",
    "estimation profile that will show how sub operations in the program (e.g.,\n",
    "`FullAdder` inside `RippleCarryAdd`) contribute to the overall costs.\n",
    "\n",
    "There are two important concepts to review to best understand the outputs of the\n",
    "profiling feature, a *call graph* and a *call tree*.  The *call graph* is static\n",
    "representation of the quantum program which informs which operations call which\n",
    "other operations.  For example, `RippleCarryAdder` calls `FullAdder`, but both\n",
    "of these operations call `CNOT`.  The call graph contains a node for each\n",
    "operation and a directed edge for each calling relation.  The call graph may\n",
    "contain cycles, e.g., in the case of recursive operations.\n",
    "\n",
    "In contrast, a call tree is a dynamic representation of the program execution in\n",
    "which there are no cycles and for each node there is a clear path from the root\n",
    "node.  For example, distinguishes the calls to `CCNOT` from `GRounds` and\n",
    "`CRounds` within the `ComputeCarries` operation in the carry-lookahead adder.\n",
    "\n",
    "But let's start looking at concrete examples, and create a resource estimator\n",
    "instance together with a parameter object to configure the bitwidth argument and\n",
    "enable the profiling feature."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "estimator = MicrosoftEstimator(workspace)\n",
    "\n",
    "params = estimator.make_params()\n",
    "params.arguments[\"bitwidth\"] = 32\n",
    "params.profiling.call_stack_depth = 10"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We enable the profiling feature by setting the `call_stack_depth` variable in\n",
    "the `profiling` group to a number that indicates the maximum level up to which\n",
    "sub operations are tracked.  More precisely, the entry point operation is at\n",
    "level 0.  Any operation called from the entry point is at level 1, any operation\n",
    "therein at 2, and so on.  The call stack depth is setting a maximum value to an\n",
    "operation's level in the call stack for which we track resources in the profile.\n",
    "\n",
    "Next, we submit a resource estimation job for the ripple carry adder by\n",
    "providing the Q# operation `EstimateRippleCarryAdd` and the job parameter\n",
    "object.  We store there the results of the job in the variable `result_rca`,\n",
    "where RCA is an abbreviation for ripple-carry adder."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "job = estimator.submit(EstimateRippleCarryAdd, input_params=params)\n",
    "result_rca = job.get_results()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can inspect the call graph by calling the `call_graph` property on the result\n",
    "object.  It displays the call graph with the node corresponding to the entry\n",
    "point operation at the top and aligns other operations top-down according to\n",
    "their level."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "result_rca.call_graph"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that the operation names are mangled, i.e., they contain additional\n",
    "information.  In this case, this can be a prefix `SNIPPET` or `ENTRYPOINT`,\n",
    "which is generated by `qsharp` Python library from the Q# that is integrated\n",
    "into the notebook.  Some operations are prefixed by their namespace, and\n",
    "operations have a suffix to indicate their variant (e.g., `body` for their\n",
    "`body` implementation, and `adj` for their `adjoint` implementation).\n",
    "\n",
    "As described above, we can see that `RippleCarryAdd` calls both the `FullAdder`\n",
    "and `CNOT`, and that the `FullAdder` also calls `CNOT` and `ApplyAnd`.  In fact,\n",
    "the `CNOT` operation is called from 3 other operations.\n",
    "\n",
    "Next, let's also generate resource estimates for the carry-lookahead\n",
    "implementation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "job = estimator.submit(EstimateLookAheadAdd, input_params=params)\n",
    "result_cla = job.get_results()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Again, we can look at its call graph."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "result_cla.call_graph"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This call graph is much larger since more operations are involved.  We can see\n",
    "the `PRounds`, `GRounds`, and `CRounds` operations, and see that `GRounds` and\n",
    "`CRounds` call `CCNOT`, but `PRounds` calls `ApplyAnd`.  We also see that the\n",
    "`adjoint` variant of `PRounds` calls the `adjoint` variant of `ApplyAnd`.\n",
    "\n",
    "It's possible to obtain a more compact call graph (and resource estimation\n",
    "profile) by inlining some functions, e.g., those that just call a different\n",
    "function and have no other logic inside.  We can use the `inline_functions`\n",
    "parameter for that:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "params.profiling.inline_functions = True\n",
    "job = estimator.submit(EstimateLookAheadAdd, input_params=params)\n",
    "result_cla_inline = job.get_results()\n",
    "result_cla_inline.call_graph"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Although the call graph is now smaller, certain details—such as the rounds in\n",
    "`ComputeCarries`—have been inlined and are no longer available for analysis.\n",
    "Thus, the value for the `inline` parameter is typically chosen based on the\n",
    "desired type of analysis. For the purposes of our remaining analysis, we will be\n",
    "considering profiles for the adder operations without inlining."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next we are inspecting the resource estimation profiles of both results.  These\n",
    "will provide the sub operations' contributions to runtime, logical depth,\n",
    "physical qubits for the algorithm, and logical qubits for the algorithm.  Let's\n",
    "first have an overview of the total counts for each result:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "breakdown_rca = result_rca['physicalCounts']['breakdown']\n",
    "breakdown_cla = result_cla['physicalCounts']['breakdown']\n",
    "\n",
    "pandas.DataFrame(data = {\n",
    "    \"Runtime\": [f\"{result_rca['physicalCounts']['runtime']} ns\", f\"{result_cla['physicalCounts']['runtime']} ns\"],\n",
    "    \"Logical depth\": [breakdown_rca['logicalDepth'], breakdown_cla['logicalDepth']],\n",
    "    \"Physical qubits\": [breakdown_rca['physicalQubitsForAlgorithm'], breakdown_cla['physicalQubitsForAlgorithm']],\n",
    "    \"Logical qubits\": [breakdown_rca['algorithmicLogicalQubits'], breakdown_cla['algorithmicLogicalQubits']],\n",
    "}, index = [\"Ripple-carry adder\", \"Carry-lookahead adder\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The resource estimation profile is generated as a JSON file that can be read\n",
    "using the speedscope interactive online profile viewer.  In order to generate\n",
    "and download the file call the `profile` property on a result object."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "result_rca.profile"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After downloading and opening the profile, we first see the runtime profile. For\n",
    "this ripple-carry adder, we can readily see that all the runtime cost is caused\n",
    "by the And operation inside the full adder.  No other operation contributes cost\n",
    "the overall runtime.  The top most box is the entry point operation and the root\n",
    "note of the call tree.  In the next layer below are all operations that are\n",
    "called from the root node and that contribute to the overall runtime.\n",
    "\n",
    "<center><img src=\"https://raw.githubusercontent.com/microsoft/Quantum/main/samples/azure-quantum/resource-estimation/profile_rca_runtime.png\" style=\"width: 910px\" /></center>\n",
    "\n",
    "On the top center is a button that displays _Runtime (1/4)_, which we can press\n",
    "to look at profiles for the other metrics.  If we click on physical algorithmic\n",
    "qubits, we get this profile:\n",
    "\n",
    "<center><img src=\"https://raw.githubusercontent.com/microsoft/Quantum/main/samples/azure-quantum/resource-estimation/profile_rca_qubits.png\" style=\"width: 910px\" /></center>\n",
    "\n",
    "For the number of qubits we account how many new qubits were allocated by some\n",
    "operation and track the maximum number of allocated qubits.  In this sample,\n",
    "`EstimateRippleCarryAdd` allocated all qubits for the adder and there are no\n",
    "additional helper qubits, e.g., in the implementation of `FullAdder`.  The entry\n",
    "point operation accounts for the additional qubits that are required for the\n",
    "padding in the 2D layout on the surface code.\n",
    "\n",
    "Let's also generate the profile for the carry-lookahead adder."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "result_cla.profile"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The runtime profile for the carry-lookahead adder contains more operations:\n",
    "\n",
    "<center><img src=\"https://raw.githubusercontent.com/microsoft/Quantum/main/samples/azure-quantum/resource-estimation/profile_cla_runtime.png\" style=\"width: 910px\" /></center>\n",
    "\n",
    "We can see that the AND gates in the `LookAheadAdd` operation and the\n",
    "`ComputeCarries` operation contribute to the overall runtime.  Inside the\n",
    "`ComputeCarries` operation, we can analyze the contribution of the sub\n",
    "operations for the different rounds, and, e.g., note that the `adjoint` variant\n",
    "for the `PRounds` takes the fewest time, whereas all other 3 rounds are of\n",
    "similar complexity.\n",
    "\n",
    "In the qubit profile, we see that some additional helper qubits are allocated in\n",
    "the `ComputeCarries` operation to hold the auxiliary $p$ bits, that are computed\n",
    "out-of-place and required in the computation of the `GRounds` and `CRounds`.\n",
    "\n",
    "<center><img src=\"https://raw.githubusercontent.com/microsoft/Quantum/main/samples/azure-quantum/resource-estimation/profile_cla_qubits.png\" style=\"width: 910px\" /></center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Advanced analysis\n",
    "\n",
    "In this notebook's final section, we will be conducting advanced analysis on the\n",
    "profiling data. Specifically, we will programmatically access the profiling data\n",
    "and examine the impact of `PRounds`, `GRounds`, and `CRounds` on the\n",
    "carry-lookahead adder's runtime for increasing bitwidths.\n",
    "\n",
    "To begin, we will establish parameters for a batching job with bitwidths ranging\n",
    "from $2^1 = 2$ to $2^{10} = 1024$, with power-of-2 steps. We will also set the\n",
    "call stack depth to 10, as demonstrated in the examples above. It's worth noting\n",
    "that while the call stack depth parameter applies to all items in the batching\n",
    "parameters, the bitwidth must be specified for each item individually."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "bitwidths = [2**k for k in range(1, 11)]\n",
    "params = estimator.make_params(num_items=len(bitwidths))\n",
    "\n",
    "params.profiling.call_stack_depth = 10\n",
    "for i, bitwidth in enumerate(bitwidths):\n",
    "    params.items[i].arguments['bitwidth'] = bitwidth"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We are submitting the job and store its results in `results_all`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "job = estimator.submit(EstimateLookAheadAdd, input_params=params)\n",
    "results_all = job.get_results()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The raw profiling data can be accessed in JSON format via the `'profile'` key in\n",
    "a resource estimation result. For example, to access the profiling data for the\n",
    "first item in `results_all`, we would use `results_all[0]['profile']`. The\n",
    "format follows a custom scheme for speedscope and is documented in detail\n",
    "[here](https://github.com/jlfwong/speedscope/blob/master/src/lib/file-format-spec.ts),\n",
    "with the corresponding JSON schema available\n",
    "[here](https://www.speedscope.app/file-format-schema.json).\n",
    "   \n",
    "Each node in the tree is assigned a frame with an index, and the profile\n",
    "contains samples organized by calling order, with each sample assigned a weight\n",
    "(e.g. runtime). To locate the frame associated with a given round name (e.g.\n",
    "`PRounds`), the following Python function can be used: it finds the frame, then\n",
    "identifies all samples that contain it, and sums up the corresponding weights."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def rounds_runtime(profile, round_name):\n",
    "    # Find the frame index which name contains `round_name` and ends in \"body\".\n",
    "    frame_index = [i for (i, frame) in enumerate(profile['shared']['frames']) if re.match(f'.*{round_name}.+body', frame['name'])][0]\n",
    "\n",
    "    # The runtime profile is the first profile\n",
    "    runtime_profile = profile['profiles'][0]\n",
    "\n",
    "    # Get variables to the samples and weights field of the runtime profile\n",
    "    samples = runtime_profile['samples']\n",
    "    weights = runtime_profile['weights']\n",
    "\n",
    "    # Sum up all the weights that correspond to samples that contain the operation,\n",
    "    # i.e., that contain the frame_index\n",
    "    return sum(weight for (sample, weight) in zip(samples, weights) if frame_index in sample)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We now extract the profile for each job result item and use the `rounds_runtime`\n",
    "function to obtain the runtime for each round, add it to a data frame together\n",
    "with the total runtime and return a plot."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "entries = []\n",
    "for idx in range(len(results_all)):\n",
    "    bitwidth = bitwidths[idx]\n",
    "    result = results_all[idx]\n",
    "    profile = result['profile']\n",
    "\n",
    "    total_runtime = result['physicalCounts']['runtime']\n",
    "    entries.append([\n",
    "        rounds_runtime(profile, \"PRounds\"),\n",
    "        rounds_runtime(profile, \"GRounds\"),\n",
    "        rounds_runtime(profile, \"CRounds\"),\n",
    "        total_runtime\n",
    "    ])\n",
    "\n",
    "df = pandas.DataFrame(data=entries, index=bitwidths, columns=[\"PRounds\", \"GRounds\", \"CRounds\", \"Total\"])\n",
    "ax = df.plot(logx=True, xticks=bitwidths, xlabel=\"Bitwidth\", ylabel=\"Runtime (ns)\")\n",
    "# show all xticks\n",
    "from matplotlib.text import Text\n",
    "ax.set_xticklabels([Text(b, 0, b) for b in bitwidths]);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note how the total runtime grows much faster compared to the runtime of the\n",
    "rounds.  The reason is that we need $O(n)$ AND gates in the preparation part of\n",
    "`LookAheadAdd` but only $O(\\log(n))$ AND and CCNOT gates in the `ComputeCarries`\n",
    "operation.\n",
    "\n",
    "Further note that logical depth of a the carry-lookahead adder is also\n",
    "logarithmic in $n$, since on the logical level, all AND and CCNOT gates, in both\n",
    "the preparation parts and in the rounds can be applied in parallel.  However,\n",
    "when mapping to surface code operations using Parallel Synthesis Sequential\n",
    "Pauli Computation (PSSPC), these operations are sequentialized (see Appendix D\n",
    "in [Assessing requirements to scale to practical quantum\n",
    "advantage](https://arxiv.org/pdf/2211.07629.pdf))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Next steps"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Great job! You now know how to use the profiling feature of the Azure Quantum\n",
    "Resource Estimator to analyze how different parts of your program contribute to\n",
    "overall resource estimates.\n",
    "\n",
    "To summarize, you can use the use the `profiling.call_stack_depth` and\n",
    "`profiling.inline_functions` job parameters to enable profiles in resource\n",
    "estimation jobs.  In the resource estimation results that you receive after\n",
    "successful job submission, the `call_graph` and `profile` properties show call\n",
    "graphs and resource estimation profiles, respectively.\n",
    "\n",
    "We encourage you to further explore this feature and try out the following\n",
    "ideas:\n",
    "   \n",
    "* Experiment with changing the `call_stack_depth` parameter\n",
    "* Investigate call graphs and profiles of recursive programs\n",
    "* Generate profiles from programs in other notebooks\n",
    "* Perform an advanced profile analysis to compare the number of helper qubits to\n",
    "  the number of input and output qubits in the carry-lookahead implementation"
   ]
  }
 ],
 "metadata": {},
 "nbformat": 4,
 "nbformat_minor": 4
}
